Density matrices in 0(N) electronic structure calculations: theory and applications 
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We analyze the problem of determining the electronic ground state within O(N) schemes, focusing 
on methods in which the total energy is minimized with respect to the density matrix. We note that 
in such methods a crucially important constraint is that the density matrix must be idempotent 
(i.e. its eigenvalues must all be zero or unity). Working within orthogonal tight-binding theory, 
we analyze two related methods for imposing this constraint: the iterative purification strategy of 
McWeeny, as modified by Palser and Manolopoulos; and the minimization technique of Li, Nunes 
and Vanderbilt. Our analysis indicates that the two methods have complementary strengths and 
weaknesses, and leads us to propose that a hybrid of the two methods should be more effective than 
either method by itself. This idea is tested by using tight-binding theory to apply the proposed 
hybrid method to a set of condensed matter systems of increasing difficulty, ranging from bulk 
crystalline C and Si to liquid Si, and the effectiveness of the method is confirmed. The implications of 
our findings for O(N) implementations of non-orthogonal tight-binding theory and density functional 
theory are discussed. 



I. INTRODUCTION 

The last few years have seen an upsurge of inter- 
est in O(N) electronic-structure methods for treating 
condensed matter both within tight-binding theory and 
within density functional theoryuLJ. In these methods, 
the number of computer operations needed to determine 
the electronic ground state is proportional to the num- 
ber N of atoms in the system, instead of showing the N 2 
or iV 3 dependence characteristic of traditional methods. 
O(N) methods arej-apasible because electronic phase co- 
herence is iocalisecOO't 3 ! This localisation property can 
be expressed by saying that the density matrix p decays 
to zero with increasing distance. 

Some practical 0(N) methods, pcplpit the locality of 
the density matrix directlyO'Enl3llj'E3oH, and have been 
shown to work particularly efficiently for systems with a 
gapo They determine the ground state by minimising 
the total energy with respect to p, with the approxima- 
tion that p is set equal to zero for spatial separations 
exceeding some cut-off R c . A central problem in any 
such approach is that a density matrix must be a pro- 
jector: it is the operator that projects onto the space of 
occupied states. Equivalently, one can say that p must 
be idempotent - its eigenvalues must be zero or unity. In 
practice, the requirement of idempotency is difficult to 
enforce. 

A number of t<xjmijaues have been suggested for enfor.c=. 
ing idempotencyD'OEj ^. Many years ago, McWeenyO 
proposed an iterative technique known as 'purification'. 
The basic idea is that an algorithm is used to transform 
a nearly idempotent operator p into another operator p 
that is even more nearly idempotent. Repetition of this 
transformation yields a p that is idempotent to any de=. 
sired precision. Later, Li, Nunes and Vanderbilt (LNV)u 
developed a related method using the same transforma- 
tion, in which the total energy is minimised with respect 



to p. 

The purpose of this paper is to present arguments for 
combining these approaches, which we shall refer to as 
the McWeeny and LNV approaches. We point out that 
the weaknesses of each approach are matched by the 
strengths of the other. The fundamental thought here is 
that McWeeny is good for finding density matrices that 
are idempotent, while LNV is good for searching through 
idempotent density matrices to find the one that yields 
the true ground state. We shall demonstrate that this 
complementarity can be formulated in a precise and ele- 
gant way. 

In the next section, we recall the main ideas of the 
McWeeny and LNV methods for determining the elec- 
tronic ground state. Section III is the heart of the paper, 
where we present our formulation of the complementar- 
ity between McWeeny and LNV, and we point out its 
implications for ground-state search strategy. Practical 
illustrations of the benefits obtained by combining the 
two app roaches are presented in Sec. IV, and in Sees. [V| 
and VI we give discussion and conclusions. 



II. THE MCWEENY AND LNV METHODS 

The arguments we shall develop are very general, but 
for simplicity we present them in the framework of or- 
thogonal tight-binding theory. We comment later on 
the extensions to the non-orthogonal case and to density 
functional theory. 

In the orthogonal tight-binding basis, the matrix ele- 
ments of the Hamiltonian are denoted by Hij , where i 
and j go over all basis functions on all atoms (1 < i,j < 
Nb, where Nb is the total number of basis functions in 
the system). If there are No occupied states, then the 
ground-state energy Eq is given by: 
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Eq = minTr (Hp) 



(1) 



where the minimisation is performed with respect to all 
Hermitian matrices p, subject to the conditions that p is 
a projector (p 2 = p) and that Trp = Nq. Note that the 
factor of two due to spin is omitted for simplicity. The 
ground-state density matrix is given by: 



9(jmI - H) , 



(2) 



where 9(x) is the Heaviside function (9 — 1 for x > and 
6 = for x < 0) and p is the chemical potential (Fermi 
energy). Instead of working at fixed number of occupied 
states, it is sometimes more convenient to work at fixed 
chemical potential, in which case we minimise the grand 
potential Q: 



Oo = rnin Tr (H — pl)p 



(3) 



subject only to the condition p 2 = p. 

If one was allowed to diagonalise H, then p could be 
straightforwardly expressed as: 



Pij — c inCj n , 



(4) 



where Ci n are the eigenvector components of H: 



(5) 



and the eigenvalues e„ are assumed to be in ascending 
order. But diagonalisation is an 0(N 3 ) process and is 
incompatible with linear scaling. 



As this process continues, the final approach to idempo- 
tency accelerates rapidly. If A deviates from zero by a 
small amount, then the deviation of A is proportional to 
the square of that amount, and similarly for deviations 
from unity. This quadratic convergence to idempotency 
can be summarised by noting that the matrices p 2 — p 
and p 2 — p are related by: 



p 2 -p = 4(p 2 -pf-3(p 2 -p) 2 



(8) 



For a given initial matrix p^ , iteration of the purifica- 
tion algorithm therefore generates a sequence /j' 1 ) , p^ 
that converges to an idempotent matrix p(°°\ However, 
there are many idempotent matrices, and p^°°> is not 
necessarily the idempotent matrix given by eqn (^) . The 
latter is uniquely specified by the statements that (a) p 
commutes with H; (b) in a representation that diago- 
nalises p and H , the eigenvalues A„ of p (which by idem- 
potency must be or 1) are given by A„ = 1 for e„ < p 
and A„ = for e„ > p. 

It has been, emphasised recently by Palser and 
Manolopoulosc3 (hereafter PM) that McWeeny purifi- 
cation automatically delivers the correct ground state 
provided the initial p is an appropriate function of the 
Hamiltonian whose eigenvalues are in the range (0,1). 
The initial p then commutes with H, and this means 
that all subsequent p^ commute with H. Provided the 
eigenvalues of p^ satisfy < A„ < | for e„ > p and 
1 > A n > | for e n < p, then repeated purification auto- 
matically yields a final p representing the exact ground 
state. PM point out that p^ satisfies the requirements 
if it is chosen as: 



S 0) = U(pI-H) + ±I 



(9) 



A. McWeeny purification 

The McWeeny purification scheme@ gives a way of 
achieving idempotency without diagonalisation. The pu- 
rification algorithm for mapping a nearly idempotent ma- 
trix p into one that is more nearly idempotent is: 



where 



3^ - 2p 



(6) 



The way this mapping works can be understood by con- 
sidering the eigenvalues of p and p, denoted by A, A re- 
spectively. Since p is diagonal in any representation that 
diagonalises p, the relationship between their eigenvalues 
is: 



A = 3A - 2A J 



(7) 



From the form of the function /(A) = 3A 2 — 2A 3 (see Fig- 



ure [j]), it follows that if — i 



< A < § then < A < 1. 

Furthermore if < A < 1 then for A < |, A < A 

and for A > |, A > A, so that iteration of the map- 
ping drives the eigenvalues towards the values or 1. 



(10) 



with -f/min, fl max lower and upper bounds on the eigen- 
value spectrum of H . A crucial feature of this procedure 
is that the grand potential converges monotonically 
to the ground state value from above: Q( n+1 *> < Q( n >. 
PM further show that a purification procedure working 
at fixed No can be obtained with a modified version of the 
McWeeny algorithm; details are given in their paper. In 
this case, the energy converges monotonically from 
above: < E (n K 

The McWeeny purification scheme, as modified by PM, 
should therefore provide an extremely effective method 
for determining the ground state, provided everything is 
done exactly. But the essence of O(N) methods is the 
imposition of a spatial cut-off on the density matrix. This 
means that at each purification step the input matrix p^ 
will be non-zero only if j is one of a spatially localised set 
of neighbours of i. Because of the matrix multiplications, 
the output matrix p^ will have a more extended range, 
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so that it must be truncated back to the imposed range 
before each iteration. Because of this truncation, the 
monotonic convergence of 17 or E must fail as the ground 
state is approached, and PM suggest that this failure of 
monotonicity be used as a criterion for terminating the 
iterative process. In other words, should the energy for 
any given iteration be higher than that from the previous 
iteration, the process should be stopped. This is an im- 
portant criterion to follow, as once the truncation errors 
become significant, the iteration will no longer converge 
on an idempotent matrix. 

This heuristic criterion for terminating the iterations 
may work in some cases, but we consider it to be unsatis- 
factory for two reasons. First, it makes the approximate 
ground-state energy depend on the details of the initial 
p^ . Second, the energy is not the minimum of any func- 
tion, so that the variational property of the exact ground 
state is lost. This means that calculated forces on atoms 
will not be consistent with the energy, so that both relax- 
ation to equilibrium and dynamics are likely to be prob- 
lematic. In addition, the use of this procedure as part 
of an O(N) density functional scheme would encounter 
other problems, as we point out in Sec. [v|. 

B. LNV minimisation 

The scheme of Li, Nunes and Vanderbilt@ also makes 
use of the purification algorithm, but in a completely 
different way. If we work at constant p, then the strategy 
is based on minimisation of 17 = Tr (H — pl)p, subject 
to the condition that p is weakly idempotent (this means 
that its eigenvalues A satisfy < A < 1). In this domain 
of matrices, the minimum value of 17 is obtained when 
p is given by eqn (^|). This is easily seen by writing 17 
in a representation in which H is diagonal (but p is not 
assumed to be diagonal): 

ft = ^( e n ~ p)Pnn ■ (H) 

n 

The diagonal elements of a weakly idempotent matrix 
must lie in the range [0, 1], so that the minimum of 17 
is obtained when p nn = 1 for e n < p and p nn = for 
e n > p. In this case, it is readily shown that all the off- 
diagonal elements of p nn must vanish, and p is given by 
eqn |). 

The constraint of weak idempotency can be achieved 
by expressing p as in eqn (||), provided the eigenvalues 
of p lie in the range (—5, §)■ In this role, p is simply an 
auxiliary matrix, introduced solely to satisfy weak idem- 
potency. Then 17 can be written as: 

n = Tt (H - pl)(3p 2 - 2p 3 ) . (12) 

But since this is a cubic form in p, it can have only a sin- 
gle minimum, which is obtained when p = p = 6{pI—H). 
In practice, it is often more convenient to minimise the 



energy E = Tr H(3p 2 — 2p 3 ) subject to the constraint that 
the number of occupied states N = Tr (3/5 2 — 2p 3 ) is held 
constant. This requires more computational effort, since 
the gradient of the electron number with respect to the 
density matrix must be evaluated; there are several effi.- 
cicnt implementations of the constant- Nq constraintE3E3. 

The great advantage of LNV over McWeeny is that it 
is variational, and this means that it works even when a 
spatial cut-off is imposed on p. Indeed, it is one of the 
standard methods of achieving linear-scaling behaviour 
in tight-binding calculationsEl With a cut-off, the min- 
imum of 17 or E is guaranteed to be above the exact 
ground-state value, and this minimum decreases mono- 
tonically to the exact value as the cut-off is increased. 
The variational property means that forces on the atoms 
calculated at the minimum are exactly consistent with 
the variations of 17 or E. 

Nevertheless, the LNV techique does have several 
weaknesses, all of which affect the process of searching 
for the ground state. An obvious weakness is that it 
does not have the quadratic convergence shown by the 
McWeeny technique. Any iterative method used to min- 
imise 17 or E will give linear convergence, so that as we 
approach the minimum the error in p at each iteration is 
some fraction of the error at the previous iteration. This 
means that - at least in the absence of a spatial cut-off - 
LNV is expected to need more iterations than McWeeny 
to achieve a given accuracy. 

This weakness is exacerbated by the fact that the LNV 
technique demands more operations in each iteration, as 
has been emphasised by PMEl This is simply because 
we need to calculate the gradient of 17 or E with respect 
to the elements of p, in addition to calculating 17 or E 
itself. So even if it did not need more iterations, LNV 
would still be slower. Roughly speaking, an LNV itera- 
tion takes about twice as long as a McWeeny iteration in 
constant-TVo calculations and somewhat more than this 
in constant-^ calculations. 

There is also a third cause of slowness in LNV, namely 
ill conditioning. In any minimisation problem, conver- 
gence to the minimum will be slow if the curvatures of 
the function are very different in different directions, or 
equivalently if the eigenvalues of the Hessian matrix span 
a wide range. But it is readily shown that the curvatures 
of 17 are determined by the quantities | e n — p , which 
will indeed span a wide range unless the system has a 
large band-gap. We should expect this weakness to be 
particularly troublesome for metallic systems, and in fact 
detailed evidence for the inefficiency of LNV for such sys- 
tems has already been presented!! 3 ! The McWeeny tech- 
nique does not suffer from this problem. 

In addition to these problems of convergence speed, 
LNV has two other obstacles: initialisation and poor 
robustness. It is far from clear how best to initialise 
the density matrix within LNV; in general, the-aasate of 
p = ^1 is made within orthogonal tight bindingaEZI. Poor 
robustness arises since 17 is a cubic form in p and is un- 
bounded below. In order for a minimisation method to 
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lead to the minimum of SI, the initial p must be chosen 
close enough to the minimum. If we start from an unsuit- 
able initial p, any downhill search method will lead away 
from the minimum, and SI will plunge towards infinitely 
negative values. In a robust search strategy we expect to 
seek the minimum of in a sequence of search directions. 
It is a sign of danger if SI has a point of inflection but 
no minimum in a search direction, and we shall use this 
idea later when discussing robustness. 

We show in the next section how a combination of 
McWeeny and LNV allows their strengths to be exploited 
and their weaknesses to be avoided. 



III. RELATIONSHIP BETWEEN MCWEENY 
AND LNV 

In analysing the relationship between the two methods, 
we find it helpful to regard matrices as elements of a 
vector space. Using this viewpoint, we shall make free use 
of geometrical concepts such as the straight line joining 
two matrices A and B, by which we mean the set of 
matrices (1 — A) A + XB, where < A < 1. We define the 
magnitude or norm of a matrix A as: 

|| A ||= [Tr(AU)] 1 / 2 , (13) 

where A' is the Hermitian conjugate of A. This allows us 
to talk of the 'distance' 1 1 A — B \ \ between two matrices. 
We shall also need the notion of scalar product of two 
matrices, defined as: 

(A, B) = Tr (J$B) . (14) 

Matrices are referred to as 'orthogonal' if (A, B) = 0. 

In the vector space just defined, there is a manifold 
consisting of all matrices that are idempotent, and we 
call this the idempotency surface. For any suitably cho- 
sen matrix A^\ the purification algorithm (Eq. (|^)) gen- 
erates a sequence of matrices AW, A^ 2 \ ... which tend to 
a limit A^°°^ lying on the idempotency surface. If we 
imagine this sequence of points in the vector space as 
joined by straight lines, then we form a path which we 
refer to as the McWeeny path. 

The following two statements, proved in Appendix |a|, 
will play a key role in our proposed strategy for finding 
the ground state: 

1. All McWeeny paths meet the idempotency surface 
orthogonally; 

2. For any point on the idempotency surface, the gra- 
dient of the LNV function is tangential to the sur- 
face. 

A picture illustrating these statements is shown in Fig. |^. 

To explain more precisely what these statements mean, 
we need to define the concept of tangent planes to the 
idempotency surface. For any point P on this surface 



(P 2 = P) , consider points on the straight line P = P + 
aB, where B is some chosen Hermitian matrix and a is a 
real scalar variable. In general, P will not be idempotent, 
but for some choices of B, P is idempotent to linear order 
in a: 

P 2 -P = a 2 B 2 . (15) 

For such choices of B, the shortest distance between a 
given point on the straight line and the idempotency sur- 
face is of order a 2 , and we can say that the straight line 
is a tangent line to the idempotency surface. The tan- 
gent plane at the point P consists of all matrices P on 
all tangent lines passing through P. A convenient way 
to construct points in the tangent plane is described in 
Appendix |a|. 

Now suppose we have a McWeeny sequence A^ going 
to the limit A^°°\ Then the meaning of statement (1) is 
that in the k — > oo limit the difference vector A^ — A^°°^ 
becomes orthogonal to every vector B for which A^+B 
is in the tangent plane passing through A^- 00 ^: 

lim (B,A^ -A^l) /|| A^ - || = 0. (16) 

k — >oo / 

The meaning of statement (2) is that, if F is the gradient 
of the LNV function at point P on the idempotency sur- 
face, then P + aF is in the tangent plane passing through 
P. The corollary is that for idempotent P the LNV gra- 
dient is orthogonal to the McWeeny path. 

Consider the implications of the two statements. We 
know that purification is a completely robust way of 
reaching idempotency. Statement (1) implies that it is 
also very direct. As we approach the idempotency sur- 
face, we are following the shortest possible path. Once 
we are near the surface, purification is tantamount to 
dropping a perpendicular onto the surface. In addition, 
quadratic convergence means that the approach to the 
surface accelerates rapidly in the final stages. But for a 
general starting point purification does not give us the 
ground state. It gives us an idempotent density matrix, 
but not the idempotent density matrix corresponding to 
the ground state. This is where LNV comes in. Once we 
are on the idempotency surface, the two statements guar- 
antee that application of LNV does not undo what was 
achieved by McWeeny, because it keeps us on the surface 
to first order. Furthermore, LNV is a completely robust 
way of finding the ground state if we are constrained 
to the surface. Within the idempotency constraint, the 
LNV function has only a single minimum, and cannot fall 
below the ground state energy. 

We have assumed up to now that everything is done 
exactly, without any spatial cut-off. In this case, pu- 
rification by itself is enough to find the ground state. As 
stressed by PM (see Sec. ||), an initial guess for p^ given 
by eqns (^) and ( |l0[ ) guarantees that repeated purifica- 
tion delivers the unique density matrix corresponding to 
the ground state. But with a cut-off the situation is quite 
different. Purification then brings us near to the ground 
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state, but leaves us with a ground-state estimate lacking 
variational properties. At this point, we suggest that the 
effective strategy is to switch to LNV. Since we are al- 
ready near the ground state, minimisation of the LNV 
function should be rapid. Although we are not exactly 
on the idempotency surface, the energy gradient should 
still maintain idempotency to good accuracy, and there 
should be no danger of approaching the unstable region 
where the LNV function decreases unboundedly. 

An important benefit of combining McWeeny and LNV 
in the way we suggest is that it makes it easier to work 
at constant electron number, which is usually what one 
wishes to do. Although the modifications proposed by 
PM make it straightforward to perform McWeeny pu- 
rification at constant No, the minimization of the LNV 
function is made more complicated by the need to hold 
Nq fixed. But once we are on the idempotency surface, 
this difficulty in the LNV scheme disappears. This can 
be seen by recalling that N = Tr [3/5 2 — 2/5 3 ], so that 
dNo/dpij — 6(p—p 2 )ij, which vanishes if p is idempotent. 
The implication is that, since the LNV gradient keeps p 
near the idempotency surface once it has been brought 
there by McWeeny purification, the electron number will 
automatically maintain itself almost constant during the 
LNV stage. Practical tests of this will be shown in the 
next section. 

In summary, the proposed hybrid strategy capitalizes 
on the expected robustness and speed of purification, but 
avoids its lack of variational properties. Variational be- 
haviour is supplied by the LNV component of the strat- 
egy, but since this is used only in the final stages, we 
hope to avoid the instability and slowness from which 
LNV can suffer if used alone. An important feature to 
note is that, as the cut-off radius increases, we expect 
the hybrid strategy to become increasingly effective: al- 
most all the work is then done by purification, so that 
the strengths of McWeeny and LNV should complement 
each other better and better. 



IV. PRACTICAL EXAMPLES 

In this section we present examples of calculations done 
using the proposed hybrid approach, compared with pure 
LNV calculations; these examples form a sequence of 
systems of increasing difficulty: bulk, diamond-structure 
carbon and silicon; a relaxed carbon vacancy; the Si(OOl) 
surface; and finally liquid silicon. 

There are two main characteristics that we are looking 
for in these tests: speed and robustness. The speed of 
a method can be gauged either by the total number of 
iterations taken, or by the total CPU time used. These 
give different measures, since each McWeeny iteration 
requires only about half as much time as an LNV itera- 
tion. When comparing different methods, we shall char- 
acterize the speed by the number of iterations needed to 
reach the ground state within a specified tolerance, and 



the implications for CPU time will be pointed out where 
appropriate. 

R obus tness is harder to characterize. As explained in 
Sec. II B , the LNV method can become unstable if the 
grand potential f2 has no minimum in a search direction, 
because this signals that we are approaching a dangerous 
region where £1 decreases unboundedly. This behaviour 
is, indeed, found on occasions in calculations based on 
pure LNV, particularly when the electron number is kept 
constant (as opposed to the electron chemical potential). 
The kind of robustness we are looking for in our hybrid 
method is therefore the absence of this kind of instability. 
Rather than repeat the refrain throughout each section, 
we will state now that we have not seen this behaviour 
at any time during these simulations. 

To perform the tests, we have used a simple, nearest- 
neighbour orthogonal tight binding ppdel, with param- 
eterisations for siliconEj and carborEll. We adapted an 
implementation of the LNV scheme by GoringeEZI. In 
this scheme, the density matrix cut-off is not defined as 
a sphere, but rather in terms of a cluster of atoms. This 
cluster is formed by including all atoms which are within 
range of a certain number of 'hops' of the central atom 
(e.g. a nearest neighbour is at one hop, and its near- 
est neighbours are at two hops from the central atom). 
The localisation criterion is thus specified in terms of 
the number of hops. All McWeeny minimisations have 
been carried out at fixed electron number; except where 
stated, the same is true for LNV minimisations and LNV 
stages of hybrid minimisations. 

The initial density matrix for the pure LNV method 
was chosen to be il, while for the hybrid and McWeeny 
methods it was constructed according to equations (j^) 
and (fiol). When using the hybrid method, the switch 
to the LNV phase from the McWeeny phase occurred 
when the energy in one iteration was higher than that in 
the previous iteration (which is indicative of truncation 
error) or an equivalent error was found in the mainte- 
nance of election number (as described by Palser and 
Manolopoulosc3) . As the density matrix had been shown 
to be invalid, the density matrix from the previous itera- 
tion was passed to the LNV phase as an initial matrix. 



A. Perfect Si and C crystals 

We have used our proposed hybrid scheme and the 
pure LNV scheme to find the electronic ground state for 
carbon and silicon in the diamond crystal structure for 
different cut-off radii of the density matrix (3, 5 and 7 
hops). Tables | and give the number of iterations taken 
by the two methods to achieve a specified tolerance in 
fractional change in cohesive energy between line min- 
imisations (in this case, 10~ 8 ). This is a standard type 
of criterion applied in practical minimisation, and is used 
here to show the performance of the methods in practical 
tests; in both cases, the methods are achieving the ground 
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state, which ensures that a fair comparison is made. For 
the hybrid scheme, we give separately the numbers of it- 
erations needed in the McWeeny and LNV stages of the 
calculation, while for the LNV scheme we simply give the 
total number of iterations. As described above in sect. 

P6 



fixed n at the value of VA D • WE/VN a ■ VN Q , which is 
the correct definition of the chemical potential for elec- 
trons. The final column of Tabic III shows the deviation 



tions 



II A and II B 



the initialisations used are the PI 
for the hybrid initialisation and |l for the LNV ini- 
tialisation. Several points should be noted. First, the 
total numbers of iterations are the essentially the same 
for both schemes. As pointed out above, this means that 
the hybrid scheme is significantly faster (by about 25 
%). Second, the LNV stage of the hybrid scheme takes 
fewer iterations as the cut-off radius is extended. This 
is expected, because the McWeeny stage should bring 
the density matrix closer to the ground state for larger 
radii. To confirm this point, we have calculated the norm 
(see Eq. (13)) of the difference between the density ma- 
trix obtained at the end of the McWeeny stage and final 
ground-state density matrix. This norm, reported in the 
last column of Tables | and EL decreases markedly with 
increasing cut-off radius. Finally, it is worth noting that 
the Si crystal takes somewhat more iterations that the C 
crystal, as might be expected because of its smaller band 
gap. 

The way in which the energy converges to its ground- 
state value in the pure LNV scheme and in the LNV 
stage of the hybrid scheme is shown in Figures || and ^, 
where we report the difference between the cohesive en- 
ergy at each iteration and the final ground-state cohesive 
energy, as a fraction of the final cohesive energy. The 
results show very clearly that the McWeeny stage gets 
closer and closer to the correct ground state as the cut- 
off radius increases. The rate of convergence in the LNV 
stage is essentially the same as that found in the pure 
LNV scheme, as expected. In all cases, the error in the 
energy decrease approximately exponentially with itera- 
tion number. 



B. Carbon Vacancy 

When a vacancy is introduced into diamond-structure 
carbon, the degeneracy of the dangling-bond states is 
broken by a Jahn- Teller distortion which lowers the sym- 
metry of the system, and defect states appear in the band 
gap. This system therefore gives us an interesting in- 
crease in complexity compared with the perfect crystal. 



Table III reports the numbers of iterations taken by the 



hybrid and pure LNV schemes. The behaviour is similar 
to that found for the perfect crystal, although for small 
cut-off radii the total number of iterations in the hybrid 
scheme is now slightly larger than for pure LNV. Never- 
theless, the hybrid scheme is still significantly faster than 
pure LNV in terms of CPU time. For the present sys- 
tem, we have examined the consequences of working at 
constant \x rather than constant electron number in the 
LNV stage of the hybrid scheme. To do this, we have 



of total electron number from its nominal value of 252 
(the calculation was done with a system of 63 atoms) in 
the final ground state when the calculation is done like 
this. The very small deviations, which decrease with in- 
creasing cut-off radius, confirm our expectation that Nq 
automatically holds itself almost constant in the LNV 
stage (see Sec. III). 



C. Si(OOl) surface 

The Si(OOl) surface is a complex electronic system. Re- 
bonding between surface atoms causes strong displace- 
ments from perfect-lattice positions and the formation 
of dimers, which themselves become buckled because of 
Jahn- Teller distortiono. These effects give rise to bands 
of gap states. Table ^ shows the numbers of iterations 
required by the hybrid and pure LNV schemes. It is clear 
that in this case the hybrid scheme is significantly faster 
than pure LNV, by a factor of between 3 and 5. We be- 
lieve that the slowness of pure LNV is a man ifest ation of 
ill conditioning caused by gap states (see Sec. [I B), which 
do not appear to affect the McWeeny stage of the hybrid 
method. As in the C vacancy case, we have examined 
the deviations of electron number caused by running the 
LNV stage of the hybrid method in constant-^ mode, and 
these are reported in the final column of Table |y|. Once 
again, they are very small, though not quite so small as 
in the vacancy case. 



D. Liquid silicon 

Liquid Si is a challenging system for any method, since 
it is both disordered and metallic. But the important 
point here is that dynamical simulation must be per- 
formed, so that energy conservation is important, and 
this means that the consistency of energy and ionic forces 
is crucial. We have pointed out in Sec. Ill that this con- 



sistency is ensured by the variational property of both 
pure LNV and our hybrid method. By contrast, pure 
McWeeny is not variational, so we are particularly inter- 
ested to find out the consequences of attempting to use 
pure McWeeny for this system. 

The simulations were done using a repeating cell of 64 
atoms at an initial temperature of 3000 K. As before, the 
orthogonal tight-binding model described in Rcf. ^ was 
used. An important technical point for the pure LNV 
simulations is that the changes in bonding from one step 
to the next are so strong that it is not helpful to use the 
final density matrix from one step as,the initial guess for 
the density matrix at the next stepEl Instead, the stan- 
dard initialisation of ^1 is used. It must be stressed that 
because of the discontinuous changes of bonding caused 
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by the spatial cut-off of the density matrix, energy con- 
servation will be far from perfect whatever method is 
used. The point at issue is therefore the relative quality 
of energy conservation for different methods. 

Another relevant technical point concerns the occupa- 
tion numbers to be used for the electronic states. There 
would be sound physical arguments for using Fermi-Dirac 
occupation numbers corresponding to the temperature of 
the simulation. For the present tests, we have instead set 
the electronic temperature to zero, so that the occupa- 
tion numbers are exactly unity below the Fermi energy 
and exactly zero above, following the theory of previous 
Sections. The motivation for doing this is that it pro- 
vides a more stringent test of 0(N) methods for metallic 
systems, since increase of the electronic temperature has 
the effect of localizing the density matrix, as described 
in Ref. || 

We have performed our /-Si simulations in all three 
possible ways: pure LNV, hybrid, and pure McWeeny, 
using a spatial cut-off of 3 hops on the density matrix. 
We find that the hybrid method is once again faster than 
pure LNV: on average, the hybrid calculations require 
13 McWeeny and 13 LNV iterations per step, while pure 
LNV requires 26 iterations, so that the hybrid method is 
between 25 % and 50 % faster than pure LNV. 

Fig. H shows the results of our tests on energy con- 
servation over a period of 0.1 ps. The variation of the 
total energy is almost exactly the same in the LNV and 
hybrid methods and consists of a steady upward drift of 
~ 0.03 eV/atom during the 0.1 ps period. This is already 
large, since it corresponds to a temperature increase of 
ca. 100 K. But with pure McWeeny, the increase of total 
energy is about 25 times larger, and in our judgment this 
does not give a satisfactory method of doing dynamics 
for this kind of system. We regard this as a compelling 
reason for not using McWeeny by itself. 



V. DISCUSSION 



The theoretical arguments presented in Sees. || and III 
showed that the McWeeny and LNV techniques behave 
in very different ways in the search for the ground state, 
and indicated that a combination of the two techniques 
should be more efficient and robust than either technique 
by itself. The practical tests we have just reported fully 
support these ideas. For the five systems we have studied, 
they confirm that in terms of CPU time the hybrid tech- 
nique is always faster, and sometimes much faster, than 
the LNV technique by itself. They also confirm that the 
hybrid technique is more robust, since the LNV itera- 
tions used in the final stage show no sign of the unstable 
behaviour that can occur if LNV is used thoughout. We 
have seen that McWeeny used by itself, following the pro- 
posals of Palser and ManolopouloscS, can indeed produce 
very accurate results for the ground-state energy, espe- 
cially when the spatial cut-off is increased, but that its 



lack of consistency between energy and ionic forces can 
cause serious problems in dynamical simulations. Im- 
portantly, our results also demonstrate that the hybrid 
method allows one to work at constant electron number 
with greater ease than when LNV is used by itself. 

Although we have chosen to work within orthogonal 
tight-binding theory, the main ideas carry over directbz 
to the non-orthogonal case. Nunes and Vanderbilto 
have already shown how to generalize the LNV scheme 
to perform O(N) tight-binding calculations with non- 
orthogonal orbitals, and Palser and Manolopoulos have 
shown that McWeeny purification can be generalized in 
the same way. We demonstrate in A ppen dix |^ that our 
two key statements presented in Sec. Ill remain valid in 



the non-orthogonal case, provided the scalar product be- 
tween matrices is defined using the appropriate metric. 
The discussion in Appendix B indicates that the only 
significant problem in using our hybrid O(N) scheme for 
doing practical calculations within non-orthogonal tight- 
binding theory is that one needs the inverse of the overlap 
matrix Sij between orbitals. However, we believe that an 
approximate inverse of Sij should suffice, and ways of ob- 
taining a suitable approximation have been proposed by 
Palser and ManolopouloscBEj. 

The ideas we have presented should also he. .anpjicable 
to 0(A) DFT. In a series of recent paperal3'EjE3S we 
have shown how a practical O(N) DFT scheme can be 
constructed, the guiding principle being the minimiza- 
tion of the DFT total energy with respect to the density 
matrix. The-ideas have been implemented in our code 
CONQUESTO (Concurrent O(N) QC/antum Electronic 
Structure Technique), in which the density matrix is rep- 
resented in a basis of localized orbitals which are var- 
ied to minimize the total energy. In the present form 
of CONQUEST, the minimization is performed in three 
nested loops. In the innermost loop, the energy is min- 
imized wtih respect to the elements of the density ma- 
trix, and the two outer loops have the tasks of achieving 
self-consistency and minimizing with respect to the lo- 
calized orbitals. The operations performed in the inner- 
most loop are therefore identical to those performed in 
non-orthogonal tight-binding theory, and the methods we 
have presented here should be applicable without change 
to this part of 0(N) DFT. 

The relevance of the present ideas to O(N) DFT gives 
another reason why the LNV component of our hybrid 
scheme is so important. In order to minimize the total 
energy with respect to the localized orbitals, it is essen- 
tial to have an analytic expression for the gradient of the 
energy with respect to variations of these orbitals. But 
the existence of such an expression for the gradient relies 
crucially on the energy being stationary with respect to 
variations of the density matrix. So once again the vari- 
ational behaviour given by LNV is essential, just as it is 
for calculating forces - and for basically the same reason. 
We are currently studying the improvements that can be 
achieved in O(A) DFT by applying the ideas we have 
presented. 
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VI. CONCLUSIONS 

We have shown that a combination of McWeeny pu- 
rification and LNV minimization methods gives a ro- 
bust and rapid means of searching for the electronic 
ground state in the framework of an O(N) density-matrix 
scheme. The McWeeny stage finds an idempotent density 
matrix quickly and efficiently, and the LNV stage then 
finds the idempotent density matrix which minimises the 
total energy. We have presented examples which demon- 
strate the advantages of this hybrid scheme, and shown 
why both stages are necessary. We have pointed out 
that the main ideas can be generalized to non-orthogonal 
tight-binding theory and O(N) density functional theory. 
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APPENDIX A: PROOF OF THE TWO 
STATEMENTS 

We want to prove the two statements enunciated in 
Sec. ED: (1) all McWeeny paths meet the idempotency 



surface orthogonally; (2) for any point on the idempo- 
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tency surface, the gradient of the LNV function is tan- 
gential to the surface. 

We show first how to characterise displacements within 
and perpendicular to the tangent plane. Suppose P is a 
point on the idempotency surface and let Q = I — P, so 
that P 2 = P and Q 2 = Q. Consider displacements away 
from P represented by P' = P + aSP. Any displacement 
SP can be written as 



where 



SP = SPn + 5P± 



SP\\ = PSPQ + QSPP 
5P X = PSPP + QSPQ 



(Al) 



(A2) 



It is straightforward to show that any displacement of 
the form P' = P + aSPii is in the tangent plane: 

P' 2 =P 2 + a{P5P\\ + 5P\\P) + a 2 5P 2 

= P' + a 2 8 P^ . (A3) 

On the other hand, no displacement of the form P' = P+ 
adPj_ can be in the tangent plane, since a little algebra 
shows that 



P' 2 =P' + a(PSPP - QSPQ) + a 2 5Pl 



(A4) 



But the term linear in a cannot vanish unless PSPP and 
QSPQ vanish separately, which implies that SP± = 0. It 
follows that the displaced point P' = P + a(8P\\ + SP±) 
is in the tangent plane if and only if 6P± = 0. 

Now let P be a point on the idempotency surface, and 
let P + aSP be a point on a McWeeny path that leads 
to P. Consider the matrix P' obtained by performing a 
single purification on P + aSP: 



P' = 3(P + aSP) 2 - 2(P + aSPf 



(A5) 



Because of the property of quadratic convergence, P' 
must deviate from P only by a term of order a 2 . The 
condition that the linear term in P' vanishes is: 



SPP - 2PSPP + PSP = 



(A6) 



If we decompose SP into its tangential and perpendicu- 
lar components by writing SP = 5P\\ + SP±, then this 
condition becomes: 



SP H = 



(A7) 



The implication is that any McWeeny path leading to P 
must approach along a direction of the form SP±. Since 
(5P|| , 5P±) = 0, this proves that all McWeeny paths meet 
the idempotency surface at right angles to the surface. 

We now prove that the gradient of the LNV function 
of eqn ([l2]) is tangential to the surface. We define the 
elements Py of this (negative) gradient as: 



Fij = -dn/dp 



(A8) 



so that from the definition given in eqn ( |l2| ) we have: 

F tj - 3(H'p + pH% - 2{p 2 H' + pH'p + H'p 2 ) l0 , 

(A9) 

where H' = H — pi. If p is a point P on the idempotency 
surface, then: 



F = H'P - 2PH'P + PR' 



(A10) 



It is now straightforward to show that P' = P + aF is a 
tangent line: 

P' 2 = P 2 + a(PF + FP) + a 2 F 2 

= P + a(H'P - 2PH'P + PH') + a 2 F 2 

= P' + a 2 F 2 . (All) 



APPENDIX B: THE NON-ORTHONORMAL 
CASE 

We show here that the two statements enunciated 



in Sec. Ill remain true when we generalize to non- 
orthonormal basis functions. In this case, the energy 
eigenvalues e„ are determined by the generalized eigen- 
value equation: 



N 



c i n 



3 = 1 



N 
3=1 



(Bl) 



where Sij is the overlap matrix. (Our notation gives 
an over-bar to quantities in the non-orthonormal case 
to distinguish them from the corresponding quantities in 
the orthonormal case.) With A*o occupied states, the 
ground-state energy Eq is then determined by: 



Eq = minTr (Hp) 



(B2) 



subject to the constraints that p is a 'generalized projec- 
tor': 



pSp = p 



(B3) 



and that Tr (Sp) — Nq. If we work at constant chemical 
potential, then we minimize the grand potential f2: 



Qq = min Tr (H — pS)p 



(B4) 



subject to p being a generalized projector. This whole 
scheme corresponds precisely to the orthonormal scheme 
through the relations: 



H = S 1/2 HS 1/2 

p = s- 1 / 2 P s- 1 / 2 



(B5) 



From this correspondence, it is clear that the McWeeny 
purification of the density matrix p is accomplished by 
iteration of the algorithm: 







p = ZpSp — 2pSpSp 



(B6) 



(To simplify the notation, we now drop— the over-bars.) 
The scheme of Palser and ManolopoulosEa can be imple- 
mented by starting from an initial density matrix given 
by: 



P 



(o) _ 



\t{»S- x - S~ X HS- X ) + 



(B7) 



as has already been pointed out in Ref. |26|. The constant - 
-ZVo algorithm can also be recast in non-orthonormal form. 

Similarly, the Nunes and Vanderbilt (NV) methocO at 
constant p consists of minimizing f2 given by: 



Sl = Tr(H- pS){3pSp - 2pSpSp) 



of the NV method 
has already been 



(B8) 

for the non- 
discussed in 



This formulation 
orthonormal case 
Refs. fi|fn§ 

The two key statements of Sec. Ill remain valid pro- 
vided we use the appropriate metric for defining scalar 
products of matrices. Instead of eqn (|l4|), we must use 
the definition: 



(A, B) = Tr{A^SBS) , 



(B9) 



which means that the matrices A and B are orthogonal 
if Tr (At SBS) = 0. The 'idempotency surface' must, of 
course, the tak en t o mean the manifold of all matrices 
satsifying eqn ( |B3[ ). The methods of Appendix |A| are 
then straightforwardly repeated to show that McWeeny 
paths meet the idempotency surface orthogonally. 

We now turn to the LNV gradient. Naively, one might 
be inclined to work with the (negative) gradient i*y = 
~d£l/dpji as defined in the orthogonal case. However, as 
has been stresse d re cently by White et alSB, for a metric 
defined by eqn. (B9), it would not be tensorially correct 
to update the density matrix p using a gradient defined 
in this way, since transformations of the basis make /Oy 
behave as a contravariant quantity but the derivative Fij 
as a covariant quantity. Instead, one should work with 
the contravariant gradient $ defined as: 



= (S-'FS- 1 ), 



(BIO) 



If one were to work with the metric defined by eqn. (14) 
then the original gradient of 
two statements in Section 



III 



is correct (though the 
require the metric defined 

by eqn. |B8|) 

It is straightforward to show that $ is tangential to 
the idempotency surface. To do this, note from eqn (BE) 
that the gradient F of f2 is given by: 



Fij = -dfl/dpji 

= 3(SpH' + H'pS),j 
- 2{SpSpH' + SpH'pS 



H'pSpS) 



13 ) 



(Bll) 



where H' = H — pS. If p is a point P on the idempotency 
surface, then this reduces to: 



F = SPH' - 2SPH'PS + H'PS , (B12) 

and the contravariant gradient is: 

$ = PH'S^ 1 -2PH'P + S~ X H'P . (B13) 

It then follows that P' = P + a$ is in the tangent plane: 

P'SP' = P + a{<$>SP + PS<S>) + a 2 <f>S<f> 

= P' + a 2 <PS<P. (B14) 




FIG. 1. The function /(A) = 3A 2 - 2A 3 in the range 
±<A<§. 



McWeeny path 



Tangent plane 




Idempotency surface 



FIG. 2. Geometrical representation of the McWeeny and 
LNV methods, showing that the density matrices generated 
by McWeeny iteration form a path approaching the idempo- 
tency surface orthogonally, and that the gradient of the LNV 
function lies in the tangent plane to the idempotency surface. 
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Hybrid, 3 hops 
Hybrid, 5 hops 
Hybrid, 7 hops 
g Pure LNV, 3 hops 
fl Pure LNV, 5 hops 
o Pure LNV, 7 hops 
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2.0 



Iteration 



FIG. 3. The difference between the cohesive energy at a 
given iteration and the final cohesive energy for the LNV 
stage of the hybrid scheme (solid lines) and the pure LNV 
scheme (dashed lines) for diamond structure carbon. Results 
are shown for different cut-off radii: 3 hops (circles), 5 hops 
(squares) and 7 hops (diamonds). The radii are discussed in 
the text. 
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FIG. 4. The difference between the cohesive energy at a 
given iteration and the final cohesive energy for the LNV 
stage of the hybrid scheme (solid lines) and the pure LNV 
scheme (dashed lines) for diamond structure silicon. Results 
are shown for different cut-off radii: 3 hops (circles), 5 hops 
(squares) and 7 hops (diamonds). The radii are discussed in 
the text. 
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FIG. 5. Change in total energy (eV units) during molecular 
dynamics runs on a 64-atom system of liquid Si for (a) pure 
LNV (solid curve) and hybrid method (dashed curve) and (b) 
pure McWeeny method. 



Radius (hops) 


Iterations (Hybrid) 
McWeeny LNV 


Iterations (LNV) 


Norm 


3 


9 8 


17 


0.049 


5 


10 7 


17 


0.015 


7 


11 5 


16 


0.007 



TABLE I. Aspects of the convergence to the ground state 
for the hybrid scheme and a pure LNV scheme for diamond 
structure carbon. The first column shows the spatial cut-off 
applied to the density matrix. The second and third show 
the number of iterations required in the McWeeny and LNV 
stages of the hybrid scheme, and the fourth the number of 
iterations required by a pure LNV scheme. The last column 
shows the norm of the difference between the final McWeeny 
density matrix and the final density matrix (see Eq. ( |l3| ) in 
Section |in| for the definition of norm) . The convergence crite- 
rion was a fractional difference of 10~ 8 between the cohesive 
energies in successive line minimisations. 
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Radius (hops) 


Iterations (Hybrid) 
McWeeny LNV 


Iterations (LNV) 


Norm 


3 


11 10 


21 


0.078 


5 


12 10 


21 


0.043 


7 


13 9 


22 


0.024 



TABLE II. Aspects of the convergence to the ground state 
for the hybrid scheme and a pure LNV scheme for diamond 
structure silicon. Results are displayed as in Table |. 



Radius (hops) 


Iterations (Hybrid) 
McWeeny LNV 


Iterations (LNV) 


AiV 


3 


10 9 


16 


0.015 


5 


11 8 


18 


0.0006 


7 


12 6 


18 


0.0004 



TABLE III. Aspects of the convergence to the ground state 
for the hybrid scheme and a pure LNV scheme for a relaxed 
vacancy in carbon. The first four columns are as in Table [l| 
while the last column shows the deviation, AN , from the 
total electron number (252) after the LNV stage of the hybrid 
minimisation. 



Radius (hops) 


Iterations (Hybrid) 
McWeeny LNV 


Iterations (LNV) 


AiV 


3 


15 14 


73 


0.0311 


5 


16 13 


79 


0.0213 


7 


17 10 


82 


0.0221 



TABLE IV. Aspects of the convergence to the ground state 
for the hybrid scheme and a pure LNV scheme for the Si(001) 
surface. The first four columns are as in Table |, while the 
last column shows the deviation, AN , from the total electron 
number (168) after the LNV stage of the hybrid minimisation. 
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